Skip to content

Make analytics-engine an optional dependency #5403

Open
bowenlan-amzn wants to merge 1 commit intoopensearch-project:feature/mustang-ppl-integrationfrom
bowenlan-amzn:analytics-optional
Open

Make analytics-engine an optional dependency #5403
bowenlan-amzn wants to merge 1 commit intoopensearch-project:feature/mustang-ppl-integrationfrom
bowenlan-amzn:analytics-optional

Conversation

@bowenlan-amzn
Copy link
Copy Markdown
Member

@bowenlan-amzn bowenlan-amzn commented May 4, 2026

Summary

Makes the analytics-engine plugin an optional dependency of opensearch-sql. Installing opensearch-sql on a distribution without analytics-engine now succeeds; all PPL/SQL queries route through the in-plugin v2 Calcite engine (Lucene-backed). When analytics-engine is installed, the existing Mustang path (parquet-backed indices → QueryPlanExecutor) continues to work.

Approach

  • extendedPlugins = ['opensearch-job-scheduler', 'analytics-engine;optional=true']
  • Move QueryPlanExecutor from a constructor parameter to an @Inject(optional = true) setter. Guice invokes the setter only when the binding exists; absent analytics-engine it's silently skipped and unifiedQueryHandler stays null. PPL routing then falls through to the v2 Calcite-to-OpenSearch path unconditionally.
  • Promote analytics-framework.jar from compileOnly to api so QueryPlanExecutor/SchemaProvider resolve from sql's own classpath when analytics-engine is absent. analytics-engine.jar stays compileOnly — its classes (e.g. OpenSearchSchemaBuilder) are reached only through RestUnifiedQueryAction, which never loads when the setter is skipped.
  • Drop the bundlePlugin { exclude '*.jar' } block. Required so sql's own classpath is self-sufficient when analytics-engine is absent (guava, calcite, jackson, etc. must resolve from sql's own jars). See "Why the exclude block has to go" below for the full tradeoff.

Addressing the diff analyzer findings

All three flagged items are intentional and follow documented OpenSearch plugin mechanisms:

  • ;optional=true syntax — parsed in PluginInfo.java:229 ("optional=true".equals(dependency[1])). Used in production elsewhere in the OpenSearch plugin ecosystem. Not ad-hoc syntax.
  • api promotion for analytics-framework — required so that sql's own classloader can resolve QueryPlanExecutor/SchemaProvider when analytics-engine is absent. When it IS present, the class-identity question is resolved by parent-first delegation: analytics-engine supplies the parent classloader (via extendedPlugins), and its copy of every framework class wins. sql's bundled jar is dead code in that scenario.
  • Removed exclude block — see below. The short answer: required for standalone install, accepted by OpenSearch for optional deps (PluginsService.java:763 skips the jar-hell cross-check), runtime collisions prevented by parent-first classloader delegation. Rests on a version-consistency assumption documented below.

Why the exclude block has to go

The excluded jars (guava, calcite, jackson, commons-*, etc.) are transitive deps sql needs to run. Previously sql didn't bundle them because analytics-engine was a required dep and shipped them — the shared classloader supplied sql at runtime. With ;optional=true sql can't rely on that parent being present, so it has to carry its own copies or it won't load at all when analytics-engine is absent.

How OpenSearch accepts the duplicates when both plugins are installed:

  • PluginsService.checkBundleJarHell (line 763) skips the cross-plugin jar-hell check for extended plugins marked ;optional=true. OpenSearch explicitly supports this shape.
  • At runtime, parent-first classloader delegation routes every class lookup to analytics-engine's copy first. sql's bundled copies are never touched when the parent is there.

The assumption this rests on: sql and analytics-engine keep the same transitive versions. Today both inherit from the OpenSearch root versions.* catalog, so guava, calcite-core, etc. match by construction. If they ever drift, parent-first would silently pick analytics-engine's version — if sql compiled against a newer API, it would break at call time. A future CI check comparing transitive sets would catch drift early.

Alternatives considered: keeping the exclude block conditional on build-time knowledge of deployment shape isn't viable — Gradle doesn't know whether the target cluster has analytics-engine. Shipping two zips (standalone vs with-analytics) isn't supported by the OpenSearch plugin distribution model. Removing the block is the cleanest option that actually delivers standalone install.

End-to-end test procedure

Verified on a 2-node sandbox cluster (OpenSearch main + analytics-engine from #21501).

Prerequisites

# 1. Build the native Rust library (needed for sandbox native-bridge module)
cd ~/code/opensearch/OpenSearch   # or your OpenSearch checkout
./gradlew :sandbox:libs:dataformat-native:buildRustLibrary -Dsandbox.enabled=true

# 2. Build and publish SQL plugin to mavenLocal
cd ~/code/opensearch/sql          # this branch
./gradlew :opensearch-sql-plugin:publishToMavenLocal

Start cluster

cd ~/code/opensearch/OpenSearch
./gradlew run -PnumNodes=2 -Dsandbox.enabled=true \
  -PinstalledPlugins="['opensearch-job-scheduler:3.7.0.0-SNAPSHOT', \
    'arrow-flight-rpc', 'analytics-engine', 'parquet-data-format', \
    'analytics-backend-datafusion', 'analytics-backend-lucene', \
    'composite-engine', 'opensearch-sql-plugin:3.7.0.0-SNAPSHOT']" \
  -Dtests.jvm.argline="-Dopensearch.experimental.feature.pluggable.dataformat.enabled=true \
    -Dopensearch.experimental.feature.transport.stream.enabled=true"

Verify Calcite/Janino path (Lucene index)

These UDFs trigger SqlFunctions.<clinit> which loads LevenshteinDistance (commons-text) via Janino code-gen — the classloader path that requires both plugins to bundle their own deps correctly.

# Create a regular Lucene index
curl -s -X PUT localhost:9200/t -H 'Content-Type: application/json' -d '{
  "settings": {"number_of_shards":1, "number_of_replicas":0},
  "mappings": {"properties": {"name": {"type":"keyword"}, "value": {"type":"integer"}}}
}'
curl -s -X POST "localhost:9200/t/_bulk?refresh=true" -H 'Content-Type: application/x-ndjson' -d '
{"index":{}}
{"name":"item1","value":10}
{"index":{}}
{"name":"item2","value":20}
'

# Queries that exercise Janino/SqlFunctions
curl -s -XPOST 'localhost:9200/_plugins/_ppl' -H 'Content-Type: application/json' \
  -d '{"query": "source=t | eval h = md5(name) | fields name, h"}'
# Expected: md5 hashes (cabf67b..., 235bac...)

curl -s -XPOST 'localhost:9200/_plugins/_ppl' -H 'Content-Type: application/json' \
  -d '{"query": "source=t | eval u = upper(name) | fields name, u"}'
# Expected: ITEM1, ITEM2

Verify Mustang path (parquet index)

curl -s -X PUT localhost:9200/parquet_test -H 'Content-Type: application/json' -d '{
  "settings": {
    "number_of_shards":2, "number_of_replicas":0,
    "index.pluggable.dataformat.enabled": true,
    "index.pluggable.dataformat": "composite",
    "index.composite.primary_data_format": "parquet",
    "index.composite.secondary_data_formats": []
  },
  "mappings": {"properties": {"value": {"type":"integer"}}}
}'

for i in $(seq 0 19); do printf '{"index":{"_id":"%s"}}\n{"value":7}\n' $i; done | \
  curl -s -X POST "localhost:9200/parquet_test/_bulk?refresh=true" \
    -H 'Content-Type: application/x-ndjson' --data-binary @-
curl -s -X POST "localhost:9200/parquet_test/_flush"

curl -s -XPOST 'localhost:9200/_plugins/_ppl' -H 'Content-Type: application/json' \
  -d '{"query": "source=parquet_test | stats sum(value) as total"}'
# Expected: total = 140

Results

Query Path Result
md5(name) Calcite → Janino → SqlFunctions cabf67b...
upper(name) Calcite → Janino → SqlFunctions ITEM1
stats sum(value) on parquet Mustang → DataFusion 140

All pass — SQL plugin bundles its own calcite/commons-text/guava and the classloader resolves correctly in both configurations.

Not in this PR

  • The routing heuristic (table name prefix parquet_) is unchanged. A proper setting-based or data-format-based trigger is a separate discussion.
  • No changes on the analytics-engine side or in core OpenSearch. The mechanism uses only existing platform features.
  • No CI guard for transitive version consistency between sql and analytics-engine. Worth adding as a follow-up; today it's a manual convention.

Check List

  • New functionality includes testing
  • New functionality has been documented
  • Commits are signed off (DCO)

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit e271ff6.

PathLineSeverityDescription
plugin/build.gradle164highDependency scope changed from 'compileOnly' to 'api' for analytics-framework-3.7.0-SNAPSHOT.jar. This promotes a previously compile-only local artifact to a transitive API dependency, altering the plugin's dependency graph and class loading at runtime. Mandatory flag: dependency configuration change.
plugin/build.gradle58highextendedPlugins declaration changed from 'analytics-engine' to 'analytics-engine;optional=true'. This modifies how the plugin loader resolves and links the analytics-engine plugin at startup, potentially altering the shared classloader chain. Mandatory flag: build plugin/dependency configuration change.
plugin/build.gradle60mediumThe entire bundlePlugin exclusion block (20+ jar exclusions preventing jar hell with analytics-engine's classloader) has been removed. Without these exclusions, the SQL plugin bundle may now include duplicate classes previously provided by analytics-engine, risking ClassLoader conflicts or silently shadowing classes at runtime. This warrants verification that the optional-dependency refactoring fully accounts for the prior deduplication logic.

The table above displays the top 10 most important findings.

Total: 3 | Critical: 0 | High: 2 | Medium: 1 | Low: 0


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

sql plugin previously declared analytics-engine as a hard dependency:

    extendedPlugins = ['opensearch-job-scheduler', 'analytics-engine']

Installing opensearch-sql on a distribution that doesn't ship
analytics-engine failed with:

    Missing plugin [analytics-engine], dependency of [opensearch-sql]

Marking the dep ;optional=true alone is not enough — TransportPPLQueryAction
Guice-injects QueryPlanExecutor on its constructor, and Guice's OpenSearch
fork rejects a required constructor parameter whose binding is missing at
injector-build time ("constructors cannot be optional").

Move QueryPlanExecutor from a required constructor parameter to an
@Inject(optional=true) setter. Guice invokes the setter only when a binding
for QueryPlanExecutor<RelNode, Iterable<Object[]>> exists — i.e. when
analytics-engine's createGuiceModules has run and bound DefaultPlanExecutor.
Absent analytics-engine, the setter is silently skipped,
unifiedQueryHandler stays null, and all PPL queries route to the v2
Calcite-to-OpenSearch path already in the sql plugin.

Drop the bundlePlugin exclude list. OpenSearch's jar-hell check skips the
extended-plugin cross-check when the dep is marked optional
(PluginsService.java:763), so sql can bundle every jar it needs to run
self-sufficiently. When analytics-engine is installed, parent-first
classloader delegation still lets analytics-engine's copies win for any
shared class; sql's bundled copies sit idle.

Promote analytics-framework.jar from compileOnly to api so
QueryPlanExecutor is reachable from sql's own classloader when the plugin
is absent. analytics-engine.jar stays compileOnly (required only for
OpenSearchSchemaBuilder, which lives in the engine plugin and is reached
only through RestUnifiedQueryAction — a class that never loads when the
setter is skipped).

Validated on a live 2-node cluster in both configurations:

- With analytics-engine installed: legacy and analytics PPL both return
  expected rows; routing to the analytics path still fires for
  parquet_-prefixed indices.
- Without analytics-engine (only opensearch-job-scheduler + opensearch-sql
  installed): cluster starts cleanly, PPL and SQL queries against Lucene
  indices return expected rows, parquet_-prefixed lookups return a clean
  IndexNotFoundException instead of a NullPointerException or
  NoClassDefFoundError.

Signed-off-by: bowenlan-amzn <[email protected]>
* Absent analytics-engine, Guice silently skips the call; {@link #unifiedQueryHandler}
* stays null and all PPL routing falls through to the legacy engine.
*/
@Inject(optional = true)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this work?

@bowenlan-amzn bowenlan-amzn added the enhancement New feature or request label May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants